evaluation result
- Media > Music (0.68)
- Leisure & Entertainment (0.68)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
- Information Technology > Artificial Intelligence > Speech (0.68)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.82)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.64)
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (4 more...)
A Training and
All models were trained on single GPUs, except for SchNet when trained on OC20-2M, which required 3 GPUs. Tables 9-12 present the extended results on OC20 across the 4 separate S2EF validation sets. Table 9: Evaluation results on the OC20 S2EF in-distribution validation set. In Table 13, we present the performance and inference throughput of the baseline models on COLL. Table 13: Evaluation of the performance of the four baseline models on the COLL dataset.Inference COLL test set Throughput Samples / Energy MAE Force MAE Force cos EFwT Model GPU sec.
An Empirical Survey of Model Merging Algorithms for Social Bias Mitigation
Shirafuji, Daiki, Saito, Tatsuhiko, Kimura, Yasutomo
Large language models (LLMs) are known to inherit and even amplify societal biases present in their pre-training corpora, threatening fairness and social trust. To address this issue, recent work has explored ``editing'' LLM parameters to mitigate social bias with model merging approaches; however, there is no empirical comparison. In this work, we empirically survey seven algorithms: Linear, Karcher Mean, SLERP, NuSLERP, TIES, DELLA, and Nearswap, applying 13 open weight models in the GPT, LLaMA, and Qwen families. We perform a comprehensive evaluation using three bias datasets (BBQ, BOLD, and HONEST) and measure the impact of these techniques on LLM performance in downstream tasks of the SuperGLUE benchmark. We find a trade-off between bias reduction and downstream performance: methods achieving greater bias mitigation degrade accuracy, particularly on tasks requiring reading comprehension and commonsense and causal reasoning. Among the merging algorithms, Linear, SLERP, and Nearswap consistently reduce bias while maintaining overall performance, with SLERP at moderate interpolation weights emerging as the most balanced choice. These results highlight the potential of model merging algorithms for bias mitigation, while indicating that excessive debiasing or inappropriate merging methods may lead to the degradation of important linguistic abilities.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > Mexico > Mexico City > Mexico City (0.04)
- (4 more...)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Texas (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (3 more...)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Texas (0.04)
- (5 more...)
- Health & Medicine (0.94)
- Education (0.68)
Generalized Inequality-based Approach for Probabilistic WCET Estimation
Toba, Hayate, Yano, Atsushi, Azumi, Takuya
Estimating the probabilistic Worst-Case Execution Time (pWCET) is essential for ensuring the timing correctness of real-time applications, such as in robot IoT systems and autonomous driving systems. While methods based on Extreme Value Theory (EVT) can provide tight bounds, they suffer from model uncertainty due to the need to decide where the upper tail of the distribution begins. Conversely, inequality-based approaches avoid this issue but can yield pessimistic results for heavy-tailed distributions. This paper proposes a method to reduce such pessimism by incorporating saturating functions (arctangent and hyperbolic tangent) into Chebyshev's inequality, which mitigates the influence of large outliers while preserving mathematical soundness. Evaluations on synthetic and real-world data from the Autoware autonomous driving stack demonstrate that the proposed method achieves safe and tighter bounds for such distributions.
- Information Technology (0.75)
- Transportation > Ground > Road (0.55)
- Automobiles & Trucks (0.55)
Appendix A Model and training procedure: details
All experiments used the same model and training procedure, unless stated otherwise. ResNet with two blocks per group and channels per group (16, 32, 32, 64), and which was not pre-trained. The integer labels were embedded using a standard embedding layer. In all figures, (shaded) error bars indicate standard deviation around the mean. However, as future extensions, it would be possible to extend the model to handle novel labels as well.
Adversarially Robust 3D Point Cloud Recognition Using Self-Supervisions Supplementary Materials Jiachen Sun
Figure A. DGCNN leverages EdgeConv as their basic operation to extract features. Please refer to our codebase for detailed parameters like batch normalization and activation functions. A.2 Self-Supervised Learning T ask We follow exactly the same setting as Poursaeed et al. [8] and Sauder et al. [9] for 3D rotation and We illustrate the FoldingNet architecture in Figure C. In this section, we first introduce the detailed formulations of the adopted attack methods. B.1 Attack Method We introduce the detailed formulation of attack methods used in our study.